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Designing data marts for data warehouses 

October 2001 ACM Transactions on Software Engineering and Methodology (TOSEM), 

Volume 10 Issue 4 
Publisher: ACM Press 

Full text available: ffl pdf(203 43 KB) Additiona ' Information: full citation , abstract , references , citings , index 
^ terms , review 

Data warehouses are databases devoted to analytical processing. They are used to 
support decision-making activities in most modern business settings, when complex data 
sets have to be studied and analyzed. The technology for analytical processing assumes 
that data are presented in the form of simple data marts, consisting of a well-identified 
collection of facts and data analysis dimensions (star schema). Despite the wide diffusion 
of data warehouse technology and concepts, we still miss me ... 

Keywords: conceptual modeling, data mart, data warehouse, design method, software 
quality management 
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Snakes and sandwiches: optimal clustering strategies for a data warehouse 
H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava 

June 1999 ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 

conference on Management of data SIGMOD '99, Volume 28 issue 2 
Publisher: ACM Press 

Full text available- f£|pdf(1.47 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms 

Physical layout of data is a crucial determinant of performance in a data warehouse. The 
optimal clustering of data on disk, for minimizing expected I/O, depends on the query 
workload. In practice, we often have a reasonable sense of the likelihood of different 
classes of queries, e.g., 40% of the queries concern calls made from some specific 
telephone number in some month. In this paper, we address the problem of finding an 
optimal clustering of records of ... 

A comparison of data warehousing methodologies | 
Arun Sen, Atish P. Sinha 

March 2005 Communications of the ACM, volume 48 issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(1 17.81 KB) Additiona | information: full citation , abstract , references , index terms 
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Using a common set of attributes to determine which methodology to use in a particular 
data warehousing project. 

4 Star graphics: An object-oriented implementation Q 
Daniel E. Lipkie, Steven R. Evans, John K. Newlin, Robert L. Weissman 
July 1982 ACM SIGGRAPH Computer Graphics , Proceedings of the 9th annual 

conference on Computer graphics and interactive techniques SIGGRAPH 
'82, Volume 16 Issue 3 
Publisher: ACM Press 

Full text available* H) pdf(955 07 KB) Additional Information: full citation , abstract , references , citings , index 
" ^ : terms 

The XEROX Star 8010 Information System features an integrated text and graphics 
editor. The Star hardware consists of a processor, a large bit-mapped display, a keyboard 
and a pointing device. Star's basic graphic elements are points, lines, rectangles, 
triangles, graphics frames, text frames and bar charts. The internal representation is in 
terms of idealized objects that are displayed or printed at resolutions determined by the 
output device. This paper describes the design and implementa ... 

Keywords: Business graphics, Subclassing 

5 Heuristic optimization of OLAP queries in multidimensional^ hierarchically clustered Q 
databases 

Dimitri Theodoratos, Aris Tsois 

November 2001 Proceedings of the 4th ACM international workshop on Data 
warehousing and OLAP DOLAP '01 

Publisher: ACM Press 

Full text available: * gpdf(1.44 MB) Additional Information: full citation , abstract , citings , index terms 

On-line analytical processing (OLAP) is a technology that encompasses applications 
requiring a multidimensional and hierarchical view of data. OLAP applications often 
require fast response time to complex grouping/aggregation queries on enormous 
quantities of data. Commercial relational database management systems use mainly 
multiple one-dimensional indexes to process OLAP queries that restrict multiple 
dimensions. However, in many cases, multidimensional access methods outperform one- 
dimensiona ... 
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Bottom-up computation of sparse and Iceberg CUBE 
Kevin Beyer, Raghu Ramakrishnan 

June 1999 ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 

conference on Management of data SIGMOD '99, Volume 28 issue 2 
Publisher: ACM Press 

Full text available: pdf(1.49 MB) Additional Information: full citation , abstract , references , citings , index 

terms 

We introduce the Iceberg-CUBE problem as a reformulation of the datacube (CUBE) 
problem. The Iceberg-CUBE problem is to compute only those group-by partitions with an 
aggregate value (e.g., count) above some minimum support threshold. The result of 
Iceberg-CUBE can be used (1) to answer group-by queries with a clause such as HAVING 
COUNT(*) >= X, where X is greater than the threshold, (2) for mining multidimensional 
association rules, and (3) to complement existing strategies for identif ... 

Automated data warehousing for rule-based CRM systems 
Han-joon Kim, TaeHee Lee, Sang-goo Lee, Jonghun Chun 
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January 2003 Proceedings of the 14th Australasian database conference - Volume 17 
ADC '03 

Publisher: Australian Computer Society, Inc. 

Full text available: "g pdf(274.28 KB) Additional Information: full citation , abstract , references , index terms 

This paper proposes a novel way of automatically developing data warehouse 
configuration in rule-based CRM systems. Rule-based CRM systems assume that 
marketing activities are represented as a set of IF-WHEN rules. Currently, to provide 
good quality CRM functionalities, CRM systems seek to combine conventional CRM 
methodologies with data warehousing technology. A data warehouse can be abstractly 
seen as a set of materialized views. Selecting views for materialization in a data 
warehouse i ... 

Keywords: CRM, analysis query, data warehouse, materialized view, rules, star-join 
index 



8 Component-driven engineering of database applications Q 
Klaus-Dieter Schewe, Bernhard Thalheim 

January 2006 Proceedings of the 3rd Asia-Pacific conference on Conceptual modelling 
- Volume 53 APCCM '06 

Publisher: Australian Computer Society, Inc. 

Full text available: g pdfd 88.64 KB) Additional Information: full citation , abstract , references , index terms 

Though it is commonly agreed that the design of large database schemata requires group 
effort, database design from component subschemata has not been investigated 
thoroughly. In this paper we investigate snowflake-like subschemata of database 
schemata expressed in the Higher-order Entity-Relationship Model (HERM). These 
subschemata are almost hierarchical in the sense that they may contain cycles in the 
schema, but not in the instances. We show that each HERM schema can be decomposed 
into such ... 

9 Poster papers - short papers: A visual interface technique for exploring OLAP data Q 
^ with coordinated dimension hierarchies 

^ Mark Sifer 

November 2003 Proceedings of the twelfth international conference on Information 
and knowledge management CIKM '03 

Publisher: ACM Press 

Full text available: ^ pdf(272.82 KB) Additional Information: full citation , abstract , references , index terms 

Multi-dimensional data occurs in many domains while a wide variety of text based and 
visual interfaces for querying such data exists. But many of these interfaces are not 
applicable to OLAP, as they do not support use of dimension hierarchies for selection and 
aggregation. We introduce an interface technique which supports visual querying of OLAP 
data, that has been implemented in the SGViewer tool. It is based on a data graph rather 
than a data cube representation of the data. Our interface pre ... 

Keywords: OLAP, data exploration, hierarchies, interface 



Gra phical interaction with heterogeneous databases 
T. Catarci, G. Santucci, J. Cardiff 

May 1997 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 6 Issue 2 

Publisher: Springer-Verlag New York, Inc. 

Full text available: ■ gpdf(602.82 KB) Additional Information: full citation , abstract , citings , index terms 

During the past few years our research efforts have been inspired by two different needs. 
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On one hand, the number of non-expert users accessing databases is growing apace. On 
the other, information systems will no longer be characterized by a single centralized 
architecture, but rather by several heterogeneous component systems. In order to 
address such needs we have designed a new query system with both user-oriented and 
multidatabase features. The system's main components are an adaptive visua ... 

11 Charles W. Bachman interview: September 25-26. 2004; Tucson, Arizona Q 
Thomas Haigh 

January 2006 ACM Oral History interviews 
Publisher: ACM Press 

Full text available: ^ pdf(761.66 KB) Additional Information: full citation , abstract 

Charles W. Bachman reviews his career. Born during 1924 in Kansas, Bachman attended 
high school in East Lansing, Michigan before joining the Army Anti Aircraft Artillery Corp, 
with which he spent two years in the Southwest Pacific Theater, during World War II. 
After his discharge from the military, Bachman earned a B.Sc. in Mechanical Engineering 
in 1948, followed immediately by an M.Sc. in the same discipline, from the University of 
Pennsylvania. On graduation, he went to work for Do ... 

12 Session 7: GYO reductions, canonical connections, tree and cyclic schemas and tree |jj 
projections 

Nathan Goodman, Oded Shmueli, Y. C. Tay 
March 1983 Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of 

database systems PODS '83 
Publisher: ACM Press 

Full text available: pdf(1.09 MB) Additional Information: full citation , abstract , references , citings 

Database schemas may be partitioned into two sub-classes tree schemas and cyclic 
schemas. The analysis of tree vs cyclic schemas introduced the concepts of GYO 
reductions, canonical connections and tree projections. This paper investigates the 
intricate relationships among these concepts in the context of universal relation 
databases. 

13 The theory of parsing, translation, and compiling Q 
Alfred V. Aho, Jeffrey D. Ullman 

January 1972 Book 

Publisher: Prentice-Hall, Inc. 

Full text available- fH odf(98 28 MB) Additional Information: full citation , abstract , references , cited by . index 
l^***— 1 °~ terms 

From volume 1 Preface (See Front Matter for full Preface) 

This book is intended for a one or two semester course in compiling theory at the senior 
or graduate level. It is a theoretically oriented treatment of a practical subject. Our 
motivation for making it so is threefold. 

(1) In an area as rapidly changing as Computer Science, sound pedagogy demands that 
courses emphasize ideas, rather than implementation details. It is our hope that the 
algorithms and concepts presen ... 

14 Special topic section on peer to peer data management: Design issues and Q 
^ challenges for RDF- and schema-based peer-to-peer systems 

^ Wolfgang Nejdl, Wolf Siberski, Michael Sintek 

September 2003 ACM SIGMOD Record, Volume 32 issue 3 

Publisher: ACM Press 
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Full text available: ^ pdf(1 35.94 KB) Additional Information: full citation , abstract , references , citings 

Databases have employed a schema-based approach to store and retrieve structured data 
for decades. For peer-to-peer (P2P) networks, similar approaches are just beginning to 
emerge. While quite a few database techniques can be re-used in this new context, a P2P 
data management infrastructure poses additional challenges which have to be solved 
before schema-based P2P networks become as common as schema-based databases. We 
will describe some of these challenges and discuss approaches to solve them. ... 

15 Semantics and implementation of schema evolution in object-oriented databases 
^Sfcy Jay Banerjee, Won Kim, Hyoung-Joo Kim, Henry F. Korth 

^ December 1987 ACM SIGMOD Record , Proceedings of the 1987 ACM SIGMOD 

international conference on Management of data SIGMOD '87, Volume 
16 Issue 3 
Publisher: ACM Press 

ui « j,/-, CA R ,m Additional Information: full citation , abstract , references , citings , index 

Full text available: < nl pdf(1.54 MB) ■ 

terms 

Object-oriented programming is well-suited to such data-intensive application domains as 
CAD/CAM, AI, and 01 S (office information systems) with multimedia documents. At MCC 
we have built a prototype object-oriented database system, called ORION. It adds 
persistence and sharability to objects created and manipulated in applications 
implemented in an object-oriented programming environment. One of the important 
requirements of these applications is schema evolution, that is, the ability to dy ... 

16 Data processing in the large: BlwTL: a business information warehouse toolkit and 
^ language for warehousing simplification and automation 

^ Bin He, Rui Wang, Ying Chen, Ana Lelescu, James Rhodes 

June 2007 Proceedings of the 2007 ACM SIGMOD i nternational conference on 

Management of data SIGMOD '07 
Publisher: ACM Press 

. Full text available: ^ pdf(355.95 KB) Additional Information: full citation , abstract , references , index terms 

Rapidly leveraging information analytics technologies to mine the mounting information in 
structured and unstructured forms, derive business insights and improve decision making 
is becoming increasingly critical to today's business successes. One of the key enablers of 
the analytics technologies is an Information Warehouse Management System (IWMS) that 
processes different types and forms of information, builds, and maintains the information 
warehouse (IW) effectively. Although traditional mul ... 

Keywords: data mining, information warehouse, warehousing language 



17 A graphical definition of authorization schema in the DTAC model 
Jonathon E. Tidswell, John M. Potter 

May 2001 Proceedings of the sixth ACM symposium on Access control models and 
technologies SACMAT '01 

Publisher: ACM Press 

Full text available: ^pdfd 86.83 KB) Additional Information; full citation , abstract , references , index terms 

The specification of constraint languages for access control models has proven to be 
difficult but remains necessary for safety and for mandatory access control policies; While 
the authorisation relation $(Subject \times Object \rightarrow \pow Right)$ defines the 
authorised permissions an authorisation schema defines how the various concepts (such 
as subjects, users, roles, labels) are combined to form a complete access. control 
model. Using examples drawn from common access contr ... 

Keywords: DTAC, access control, computer security, constraints, dynamic, graphs, roles, 
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Computation: finite and infinite machines 
Marvin L. Minsky 
January 1967 Book 

Publisher: Prentice-Hall, Inc. 

Additional Information: full citation , abstract , references , cited by . index terms 
From the Preface (See Front Matter for full Preface) 

Man has within a single generation found himself sharing the world with a strange new 
species: the computers and computer-like machines. Neither history, nor philosophy, nor 
common sense will tell us how these machines will affect us, for they do not do "work" as 
did machines of the Industrial Revolution. Instead of dealing with materials or energy, we 
are told that they handle "control" and "information" and even "intellectua ... 
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* ABSTRACT 



Data warehousing and on-line analytical processing (OLAP) are essential elements of decision 
support, which has increasingly become a focus of the database industry.. Many commercial products 
and services are now available, and all of the principal database management system vendors now 
have offerings in these areas. Decision support places some rather different requirements on 
database technology compared to traditional on-line transaction processing applications. This paper 
provides an overview of data warehousing and OLAP technologies, with an emphasis on their new 
requirements. We describe back end tools for extracting, cleaning and loading data into a data 
warehouse; multidimensional data models typical of OLAP; front end client tools for querying and 
data analysis; server extensions for efficient query processing; and tools for metadata management 
and for managing the warehouse. In addition to surveying the state of the art, this paper also 
identifies some promising research issues, some of which are related to problems that the database 
research community has worked on for years, but others are only just beginning to be addressed. 
This overview is based on a tutorial that the authors presented at the VLDB Conference, 1996. 
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DEPARTMENTS 



Data Warehouse Glossary 

This glossary is a compilation of definitions contributed by the experts. 

Access: The act of retrieving data from the data warehouse databases. 

Access Path: The path selected by the database management system to 
locate and retrieve requested data. 

Ad hoc query: A request for information that is normally fabricated and run a 
single time and cannot be anticipated in advance. It consists of an SQL 
statement that has been constructed by a knowledgeable user or through a 
data access tool. 

Aggregation: The process by which data values are collected with the intent 
to manage the collection as a single unit. 

Example: The combination of fields for the same customer extracted from 
multiple sources. 

Analysis: The act of evaluating the data retrieved from the data warehouse. 

Analytics applications: Processes that produce information for management 
decisions, usually involving demographic analysis, trend analysis, pattern 
recognition, drill-down analysis and profiling. 

Examples of analytics applications include: customer segmentation, 
customer probability models, campaign measurement, up-sell opportunities, 
cross-channel analysis, sales distribution analysis, cross-sell opportunities, 
trigger inventory analysis, supply chain analysis, customer quality analysis, 
channel satisfaction measurement, click stream analysis, backlog analysis, 
churn analysis, interaction analysis, booking analysis, billing analysis, 
distribution analysis, retention analysis, delivery analysis, fulfillment analysis, 
and promotion effectiveness. 

Anomaly: A deviation, irregularity, or an unexpected result. A data anomaly 
may occur when a data field defined for one purpose is used for another. 
Examples of anomalies are negative numeric fields that should be positive 
(negative number of dependents), abnormally high numeric values (person 
weighing 3000 pounds), pairs of values in related columns that make no sense 
(male patient having a hysterectomy). 

Architect: A person or team who defines how the environment for the data 
warehouse, analytics application, or operational system is built. 
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Architecture: A framework for organizing the planning and implementation of 
data resources. The set of data, processes, and technologies that an 
enterprise has selected for the creation and operation of information systems. 
The blueprint that describes the environment that the data warehouse, analysis 
application, or operational system is built. 

ASP - Application Service Provider: A company whose business is 
providing application services for its client companies. Such applications can 
include both tactical systems, such as billing systems, or strategic solutions 
such as CRM. (CRM ASPs currently account for over half the ASP market.) 

Atomic data: Data at its most granular and detailed level. 

Attributes: In logical data modeling, attributes of an entity refer to the 
properties of that entity. Each property will have one distinct value per instance 
of the entity. Example: Entity = Automobile, Attribute = color, attribute value = 
red. When logical models are translated into physical data models, entities 
become tables, and attributes become columns. Note: there is not necessarily 
a 1:1 correlation between the logical model objects and the physical model 
objects. 

Availability: The percentage of time during scheduled hours that the system 
can be used. It also can refer to the days/week and the hours/day that the 
system is scheduled for use. See Service Level Agreements 

Back-end: Populating the data warehouse with data from operational source 
systems. 

Base table: In relational databases, tables are defined as temporary or base. 
Base tables are the tables that are created by the CREATE TABLE command 
and are used for persistent storage. 

Batch windows: The time that is required to run the ETL process from 
beginning to end. 

Best-of-breed: Refers to the most effective, powerful, functional and optimal 
choice of product in each category of tool. As organizations choose tools, they 
must decide whether they wish to chose a suite of products from the same 
vendor (where some of the tools in the suite are not terrific) or choose the best 
product in each category, i.e., best of breed, and integrate those tools 
themselves. 

Best practices: Processes and activities that have been shown in practice to 
be the most effective. 

Beta release: A version of the vendor's software that is given to selected 
installations prior to the product becoming generally available. This version is 
often not free of defects. 

Big Bang (approach): Delivering all the intended functions of the data 
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warehouse at the same time. 

Bitmapped indexes: This is an alternate (to B-tree) indexing mechanism that 
involves building streams of bits where each bit is related to the column value 
for a single row of data in a table. The use of bitmapped indexes on low- 
cardinality fields (fields that have few possible distinct values) improves query 
performance significantly. 

Boilerplate: Standard verbiage that can be used multiple times for the same 
purpose. Vendors respond to RFPs with boilerplate so they do not have to 
write the same material multiple times. 

Business analyst: The person whose job it is to analyze the operation and 
data of the business to develop a business solution. 

Business drivers: The tasks, the information and the people that promote and 
support the goals of the enterprise. The requirements that describe what the 
business wants (e.g., more quality data, faster response to queries). A problem 
in the business that is important enough to spell the difference between 
success and failure for an organization. 

Business intelligence (Bl): Normally describes the result of in-depth analysis 
of detailed business data. Includes database and application technologies, as 
well as analysis practices. Sometimes used synonymously with "decision 
support," though business intelligence is technically much broader, potentially 
encompassing knowledge management, enterprise resource planning, and 
data mining, among other practices. 

Business process engineering: The analysis and re-design of business 
processes and associated technology systems, with the goal to eliminate or 
reduce redundancy and streamline interactions. 

Business rules: Policies by which a business is run. The business rules 
contain constraints on the behavior of the business. The assertions that define 
data (e.g., the state code business rule might be the 50 United States, the 
District of Columbia and the U.S. Territories) from a business point of view. 

Business sponsor: Manger or executive who acts as visionary for the data 
warehouse program and can articulate how the data warehouse can drive 
business improvements. Establishes the "need, pain, or problem" the data 
warehouse will solve, serves as a tiebreaker for issues during the project, and 
might actually fund some or all of the data warehouse development. 
See Sponsor 

Business timestamp: A business timestamp is a timestamp that is generated 
by a business event and not a result of a systems operation. Examples are: 
salesjimestamp, orderjimestamp, shipment_date, etc. Typically, all facts in a 
Data Warehouse have at least one business timestamp, which can be traced 
to a transaction in the source operational system. 

Business users: Personnel reporting to the line-of-business who access the 
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data warehouse by writing reports and queries or who use the reports and 
queries generated by others. 
See Users 

Caching: As related to caching reports, this involves storing the results of pre- 
run reports in tables (instead of caching to memory as the usage of the word 
implies) so that when the user accesses the report for the first time, it seems to 
run instantaneously. This is a feature provided by the server component of 
many of the popular OLAP tools. 

Campaign Analysis: Campaign analysis provides a measurement of 
responsiveness to campaigns by households and by individual customers. It 
provides the ability to measure the effectiveness of individual campaigns and 
different media and offers the ability to conduct cost-benefit analysis of 
campaigns. 

CEO: Chief Executive Officer 
CFO: Chief Financial Officer 

Champion: The (high level) person in the organization who supports and 
promotes the data warehouse, its use, and those who developed and maintain 
it. A person with sufficient clout in the organization who believes in and sells 
the idea of the data warehouse and helps solve problems between groups. 

Channels: The method/means by which a product or service is marketed, 
ordered, and delivered. 

Charge back: The process of assessing and assigning the costs of a system 
to the departments that use it. 

Check totals: "Check totals" is a loose term used to describe the total sum of 
the values in an additive column of data across all rows of data that are within 
scope. This total is usually calculated before and after moving data across 
platforms or processing data in order to ensure no data was lost. 

CIO: Chief Information Officer 

Class: A collection of objects that share common properties, common 
definitions and common behaviors. 

Clickstream: Series of page visits and associated clicks executed by a Web 
site visitor when navigating through the site. Analysis of clickstream data can 
help a company understand which products, Web site content, or screens were 
of most interest to a given customer. 

CMM: Capability Maturity Model: Developed by the Software Engineering 
Institute (SEI), the CMM is a representation of the goals, methods, and 
practices needed for the industrial practice of software engineering. The goal of 
the model is to have processes that are repeatable, defined, managed, and 
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optimized. 

Conformed dimensions: A dimension defines the organization of the 
measures (facts), or it is an entity "how" an organization measures a fact. A 
conformed dimension is a dimension that is agreed upon its use and semantics 
across the enterprise, which makes it "conformed". 

COO - Chief Operating Officer 

Consultant: A consultant is someone who provides expertise and can be an 
advisor or a deliverer of tasks. Consultants are hired for their expertise when 
the company has none 

Consultants often help define the data warehouse strategy and assess the 
organization's ability to implement the data warehouse. 

Contractor: A contractor is a person who provides the delivery of tasks. The 
contractor might be responsible for building the ETL process or for overseeing 
the DBA functions. Contractors are hired when the company has a shortage of 
skilled workers. The company tells them what needs to be done, and the 
contractors perform the work. 

Control totals: The addition of values of specific fields to verify that the ETL 
job streams have executed properly. Cross footing of numbers to verify that a 
process (e.g. ETL) has executed successfully. 

Corporate information factory (CIF): The framework that exists that 
surrounds the data warehouse; typically contains an ODS, a data warehouse, 
data marts, DSS applications, exploration warehouses, data mining 
warehouses, alternate storage, and so forth. 

Cost/benefit analysis: The process by which the value of a project is 
estimated based on the expected costs compared to the tangible benefits 
usually expressed as increased revenue, or reduced cost. 

Critical success factor: An element that contributes to the success of a 
project, without which the project will fail. 

CRM - Customer relationship management: Infrastructure that enables 
delineation of and increase in customer value and the correct means by which 
to increase customer value and motivate valuable customers to remain loyal - 
indeed, to buy again. A collection of integrated applications, which facilitate the 
seamless coordination between the back office systems, the front office 
systems, and the web. The DSS expansion of CRM Analytics refers to 
customer-centric analytics applications. 

Cross organizational: Includes multiple departments within an organization. A 
non-redundant and horizontally cross-functional view of the business. 

Cross Selling: Selling an additional category of products as a result of the 
customer's original purchase. 
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CTO: Chief Technology Officer 

Customer Segmentation: Separating customers by factors such as age, 
gender, educational background, and liking or disliking Wayne Newton. 

DA - Data administrator: The role responsible for the enterprise's data 
resources and for the administration, control, and coordination of all data 
related analysis activities. The DA has the responsibility for planning and 
defining the conceptual framework for the overall data environment. The 
functions of the DA typically include requirements definition, logical data 
modeling, data definitions, logical to physical mapping, maintenance of 
inventory of the current system, data analysis, and the meta data repository. 

DASD: Rotating magnetic disk storage. 

Data architecture: The framework for organizing the planning and 
implementation of data resources. The set of data, processes, and 
technologies that an enterprise has selected for the creation and operation of 
information systems. 

Data analysis: The systematic study of data so that its meaning, structure, 
relationships, origins, etc. are understood. 

DBA - Database administrator: The Database Administrator is responsible 
for the physical aspect of the data warehouse. This includes physical design, 
performance, and maintenance activities including backup and recovery 

Data loading: The process of populating a data warehouse. It may be 
accomplished by utilities, user-written programs, or specialized software from 
independent vendors. 

Data mapping: The process of identifying a source data element for each data 
element in the target environment. 

Data mart: An implementation of an analytics application serving a single 
department, subject area, or limited part of the organization. Usually refers to a 
physical platform on which summarized data is stored for decision support. 
Data marts are commonly used for specific analysis purposes by a single 
organization or user group. 

Data mining: Discovery mode of data analysis, or analyzing detail data to 
unearth unsuspected or unknown relationships, patterns and associations that 
might be of value to the organization. Advanced analysis used to determine 
certain patterns within data. Most often associated with predictive analysis. A 
process of analyzing large amounts of data to identify patterns, trends, 
activities, and content of data content relationships. 



Data ownership: Responsibility for determining the required quality of the 
data, for establishing security and privacy for the data and determining the 
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availability and performance requirements for the data. Data originators who 
have the authority, accountability, and responsibility to create and enforce 
organizational rules and policies for business data. 
See Ownership 

Data stewardship: Responsibility for the quality of the business data; an 
information expert about a particular subject area. 
See Stewardship 

Data Warehouse Manager: The data warehouse has overall responsibility for 
all the organization's data warehouse initiatives, for data warehouse standards, 
and for data warehouse tools. The data warehouse project managers may 
report to the data warehouse manager or they may report to individual 
sponsors. 

Data Warehouse Project Manager: See Project Manager 

Data quality: The degree of excellence of data. Factors contributing to data 
quality include: the data is stored according to their data types, the data is 
consistent, the data is not redundant, the data follows business rules, the data 
corresponds to established domains, the data is timely, the data is well 
understood, the data satisfy the needs of the business, the user is satisfied 
with the validity of the data and the information derived from that data, the data 
is complete, and there are no duplicate records. For example, this means that 
a customer's name is spelled correctly and the address is correct. 

Data staging: The storage of data prior to it being loaded into a data 
warehouse or data mart. 
See Staging area 

Data Warehouse: A collection of integrated, subjectroriented databases 
designed to support the DSS function, where each unit of data is relevant to 
some moment in time. The data warehouse contains atomic data and lightly 
summarized data. 

DDL - Data definition language: The SQL syntax used to define the way the \ 
database is physically organized. 

Deadline: The point in time by which a project must be completed. 

Deliverable: The tangible output from a task or a project, e.g. logical model, 
project agreement, database design or application. 

Delta: A change, e.g. the difference from one period to the next. 

Demo: Short for demonstration as in a vendor demonstration of software to 
impress the users. 

Denormalization: Data or data design elements that do not conform to the 
rules of data normalization. Denormalized data structures are often used in 
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databases to provide rapid access for specific user needs. Denormalization 
usually results in some degree of data redundancy in a data record. A process 
of combining like data into a single entity (table or file). This combining will 
create duplicate data. 

Departmental systems: A data mart implementation that serves the needs of 
only a single department such as Human Resources or Finance 
See enterprise systems 

Derivations: The transformation of data in the ETL process in which the data 
is created through the use of an algorithm based upon data from multiple 
sources. 

Derived data: A new data element that is created from or composed of other 
data elements. 

Design review: A peer review of project deliverables, such as design 
specifications, program code or test specifications. The objective of the review 
is to find weaknesses, errors and problems. The process where different 
groups are given access to the design to provide input on how it might be 
changed to 1) work best with the tools selected or 2) be complete in its solving 
the problem. 

Dimensional hierarchy: A dimensional hierarchy refers to the different levels 
of data within a dimension that data can be rolled up to or down to for analysis. 
This can be represented in a data model by a series of related tables with 
parent-child relationships (snow-flaked schema's) or by multiple columns within 
a dimension table (standard star schemas) called hierarchy columns. Example: 
the dimensional hierarchy of a sales organization could include the following 
levels: salesperson, branch, territory, region, company. 

Dimension data: An entity used to describe, qualify, or otherwise add 
meaning to "facts" in a star schema fact table. Dimensions are the "by" items in 
analysis of facts "by" product, market, time, period, etc. Descriptive data that 
describes the measurements (facts) that business users wish to analyze. 

Domain (synonym valid values): A set of data values which represent the full 
range of allowable values that may be used for a given data attribute. Defines 
validity criteria for a particular column or field. Domains include data types and 
valid values. For example, Gender could be a domain defined as have the data 
type of Character of 1 byte containing "F" for Female, "M" for Male, and "N" for 
Not so Sure. 

DSS: Decision Support System 
See Decision Support System 

EIS: Executive Information System: A system that lets upper management 
view the organization's performance at a highly summarized level and usually 
in a graphical representation. 
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End-user: See User, Business User 

Enterprise data model: A logical data model that incorporates all the 
important components of an enterprise data architecture. Components include 
entities, attributes, relationships, rules and definitions stated in business terms. 
A schematic defining the data and their relationships that is applied to the 
whole organization. Diagram of a single non-redundant view of business data, 
showing how data is used by the business activities of an organization. 

Enterprise Data Warehouse: A collection of data that can be defined and 
shared across the whole enterprise along the lines of common dimensions to 
be used for analysis. 

Enterprise systems: Systems that support and are used by the entire 
enterprise 

See departmental systems 

Entity: A person, place, thing, concept or even about which an organization 
collects data. 

ERP: Enterprise Resource Planning: Tying together and automating of diverse 
components of a company's operations, including ordering, fulfillment, staffing, 
and accounting. This integration is usually done using ERP software tools. 

ETL: Extract/Transform/Load: This is the process of extracting data from their 
operational data sources or external data sources, transforming the data which 
includes cleansing, aggregation, summarization, integration, as well as basic 
transformation (1 becomes "Male" 2 becomes "Female"), and loading the data 
into some form of the data warehouse (ODS, enterprise data warehouse, data 
mart). ETL can also refer to the vendor software that performs these 
processes. 

FAQs: Frequently Asked Questions: Questions that are repeated, usually 
asked by the users of the help desk or of the project support team. Software 
vendors also have FAQs which are usually asked by technical people who 
support the vendors' software. To minimize support requirements and to 
assure a consistent response, FAQs are normally captured, validated, and 
made available through a web site. 

Fact table: The central table in a star join schema, characterized by a 
composite key, each of whose elements is a foreign key drawn from a 
dimension table. Facts are information about the business, typically numeric 
and additive. A table that contains the measures that the business users wish 
to analyze to find new trends or to understand the success or failure of the 
organization. 

Federated database system (FDS): A federated database system is a 
collection of independently managed, heterogeneous database systems that 
allow partial and controlled sharing of data without affecting existing 
applications. An FDS presents an enterprise view of data. 
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Foreign keys: Foreign keys are columns on one table that are inherited from 
the primary key of another table by means of a dependent or independent 
relationship. 

Front-end: The access and analysis piece of the data warehouse architecture. 

FTE: Full time employee, Full time equivalent 

FTP - File transfer program: A program that transfers data from one 
computer to another. 

Gap Analysis: The difference between what is needed and what is available. 
The difference between whereVou are and where you want to be. 

Global 2000: The 2000 largest companies worldwide. 

Goal: An objective to be achieved within a specific period of time. 

Granularity: The level of the measures within a fact table represented by the 
lowest level of the dimensions. 

Hard dollar (benefits): Tangible benefits that can be measured. Hard dollar 
benefits can result from an increase in revenue or a reduction in cost. 

Historical data: Data from previous time periods, in contrast to current data. 
Historical data is used for trend analysis and for comparisons to previous 
periods. 

Infrastructure: The architectural elements, organizational support, corporate 
standards, methodology, data, processes, and physical hardware/network, etc. 
that make up the data warehouse environment. 

Integration: The activity of combining data from multiple data sources to 
present a single collection of data to the warehouse. 

Islands of automation: Systems that were developed without consideration 
for their ability to interface with each other. As a result, data stored in these 
systems is often redundant and inconsistent. 
See silos, stovepipes 

IPO: Initial public offering 

IT - Information Technology: The department that builds and maintains 
computer systems. 

Iteration: The division of a project in which functionality is provided to the 
users in a series of phases. 

Joins: Within the context of SQL, joining refers to the comparison of similarly 
valued keys across multiple tables for the purpose of selecting rows of data 
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from multiple tables. This is done by means of an SQL SELECT statement 
where the comparison of the keys is performed in the WHERE clause. 

Justification: The process by which each project is evaluated to determine if 
there is financial viability in its implementation. The justification process also 
allows management to prioritize projects. 
See cost/benefit, ROI 

Knowledge Transfer: The act of transferring knowledge from one individual to 
another by means of mentoring, training, documentation, and other 
collaboration. 

Legacy system: Any existing production or operational system. Legacy 
systems often provide the source data for the data warehouse. 
See Operational Systems 

Libraries (queries and reports): Sets of programs that have been created, 
fully tested, quality assured, documented, and made available to the user 
community. The programs in these libraries are variously called canned, 
predefined, parameterized, or skeleton queries/reports. They are launched by 
the user, who only enters a variable such as a date, region number, range of 
activity or some other set or sets of values the program needs to generate a 
query or report. 

Line of business: Divisions of a company responsible for the production and 
creation of the organization's products and/or services. IT, HR and Accounting 
are not lines of business. 

Logical data model: An abstract formal representation of the categories of 
data and their relationships in the form of a diagram, such as an entity- 
relationship diagram. A logical data model is process independent, which 
means that it is fully normalized, and therefore does not represent a process 
dependent (e.g. access-path) database schema. 

Market Penetration: The percentage of the market owned by a company as 
represented by share of revenue. 

Matrix management: A reporting structure in which the manager does not 
hold the performance and payroll card of the subordinate. This is synonymous 
with dotted line responsibility. 

Mentor: A person who provides guidance and recommendations to a more 
junior person for courses of action and behavior. 

Meta data: "Data about data." Usually refers to agreed-on definitions and 
business rules stored in a centralized repository so business users - even 
those across departments and systems - use common terminology for key 
business terms. Can include information about data's currency, ownership, 
source system, derivation (e.g. profit = revenues minus costs), or usage rules. 
Prevents data misinterpretation and poor decision making due to sketchy 
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understanding of the true meaning and use of corporate data. 

Methodology: Proven processes followed in planning, defining, analyzing, 
designing, building, testing, and implementing a system. 

Metrics: Any type of measurement. Metrics could include business results, 
quantification of system usage, average response time, benefits achieved, etc. 
The measures that an organization believes is vital for its success. 

Milestone: A tangible event used to measure the status of the project. Markers 
during the execution of a project that shows the movement of a project in the 
right direction. 

Mission: A high level set of goals of the organization. For example to be the 
low cost producer or the company with the highest level of customer 
satisfaction. 

MPP - Massively Parallel Processing: A parallel hardware organization that 
de-emphasizes the sharing of memory resources. 

Multidimensional: The aggregation of data along the lines of the dimensions 
of the business, e.g. sales by region by product by time. 

Near-line storage: Data storage that is not on-line and not with immediate 
access. 

Networking: 

(1) Connecting with people of like interests for the purpose of uncovering 
opportunities, identifying landmines and learning of best practices. 

(2) The ability to tie more than one component togiether through protocols (e.g. 
TCP/IP) 

Object: An instance which is a member of a class. 

Objective: Desired outcome of the delivery of the project. An objective can be 
measured. 

OCM: Organizational Change Management 

OLAP - Online Analytical Processing: "Drilling down" on various data 
dimensions to gain a more detailed view of the data. For instance, a user might 
begin by looking at North American sales and then drill down on regional sales, 
then sales by state, and then sales by major metro area. Enables a user to 
view different perspectives of the same data to facilitate decision-making. 

OLTP - Online transaction processing: Defines the transaction processing 
that supports the daily business operations. 

00 - Object oriented: A self-contained module of data and its associated 
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code. 

Operational data: Data that supports the productions systems that run the 
business. This includes, but is not limited to, OLTP systems. 

Operational system: The system that creates, updates and accesses 
production systems. They do not access, or update decision support systems. 
See Legacy system 

Organizational change management: Major change is defined as those 
situations in which performance of job functions require most people 
throughout the organization to learn new behaviors and skills. Major change 
encompasses an entire workforce and can focus on innovation and skill 
development of people. 

To some degree, the downside effects of change are inevitable. Whenever 
groups of people are forced to adjust to shifting conditions, discomfort will 
occur. The key is to proactively recognize the effects of change, plan for the 
change, and develop skill sets and tools to support the change and inevitable 
discomfort associated with it. Without this proactive approach, the risk of poor 
project implementation increases significantly and reduces the opportunity to 
achieve expected compliance. 

Outsourcing: Assigning responsibility for all or a portion of the activity and 
tasks involved in developing and/or running and maintaining a system to a 
vendor outside of the organization. 

Ownership, Owners of source data: One of the more controversial and 
disputed ideas. The person or group who has responsibility for determining 
who can access the data warehouse (security), the domains of the data, the 
performance and availability requirements. 
See Data Ownership 

Pain: An unfulfilled business need that jeopardizes the success of the 
organization. 

Parallelism: The ability to run the same process simultaneously (in parallel) 
within more than one processors. 

Partitioning: The ability to divide a table into pieces (partitions). The division 
can be horizontal (by data value - for example by date) or vertical (by columns 
- for example, most used columns in one partition, the least used columns in 
another partition.). 

Periodicity: The frequency of load/update/refresh of the data warehouse, e.g. 
daily, weekly, monthly. 

PERT Chart: A graphical representation showing the critical path for a project 
applied to a calendar. 
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Phasing: The method of delivering the data warehouse in separate groupings 
of functionality to particular groups of users rather than delivering everything all 
at once to all the intended users. 

Physical Data Model: A formal representation of data and their relationships 
in the form of a diagram, depicting the physical placement of data in a 
database. A physical data model is process dependent, which means that it is 
denormalized to provide maximum performance efficiency. It is commonly 
referred to as logical database design or database design schema. 

Pilot: The initial implementation of a data warehouse. A pilot is always a 
subset of the intended function and would include a subset of the total set of 
users. A partially built system to show the capabilities of a full implemented 
system. A pilot should not become a live system, but usually does. A pilot, 
proof of concept and prototype are sometimes used synonymously. 

Platform: The hardware, operating system and database management or file 
system 

on which the data warehouse runs. 

Political agenda: The plans of an individual to enhance his or her position in 
the organization. 

Power users: Knowledge workers who are capable of writing complex queries 
and reports with little need for help. 

Primary key: Refers to the column(s) on a relational table that uniquely define 
a row of data on that table. 

Project agreement: A document outlining the scope of a project including the 
deliverables, the functions, tools to be used, service level agreements, 
responsibilities and schedule. The project agreement sometimes includes the 
anticipated milestones. 

Project Manager: Sometimes referred to as the data warehouse project 
manager, the Project Manager has overall responsibility for a project's 
successful implementation. The Project Manager defines, plans, schedules, 
and controls the project. The project plan must include tasks, deliverables and 
resources - the people who will perform the tasks. The manager will monitor 
and coordinate the activities of the team, and will review their deliverables. If 
contractors and consultants are used, the Project Manager assigns the tasks, 
monitors activities and deliverables and assures that knowledge transfer is 
indeed taking place. 

Project Management Office: Sometimes called project office. This is the 
office or department responsible for establishing, maintaining and enforcing 
project management processes, procedures, and standards. It provides 
services, support, and certification for project managers. 

Proof-of-concept: Software trial that allows a prospect to try out the product 
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before buying it. Delivers a realistic slice of functionality and is often used as 
the foundation for the first application. A quickly built system to show the 
capabilities of an idea. A proof-of-concept should not become a live system, 
but usually does. A pilot, proof of concept and prototype are sometimes used 
synonymously. 

Prototype: A less formal experimental and experiential development process 
of a proposed application for the purpose of demonstrating some or all of its 
functional capabilities. A prototype does not have the same rigorous testing, 
documentation, and implementation requirements as a software release or an 
application doeSi and should therefore never be implemented as-is. 

Quality: The absence of any defect. The characteristics of a system that 
conforms to the original design. A system of quality would have the following 
characteristics: 1. Maintainability (easy to add new functions), 2. Conformance 
to specifications (fulfilling end user requirements), 3. Long mean time to failure 
(few bugs and abnormal terminations), 4. Performance that is adequate or as 
expected, 5. Well tested for functionality, user interface, and performance, 6. 
Well documented, 7. Easy to use, and 8. Uses standard interfaces. 

OA - Quality Assurance: The department, role or process responsible for 
validating that which is proposed to ensure a correct outcome. The planned 
and systematic activities to provide confidence that a product or service will 
fulfill requirements for quality. 

RAD: Rapid Application Development 

A process where the time is set (timeboxed) and a small set of deliverables is 
implemented in a reasonably short period of time. 

RDBMS - Relational database management system: e.g. DB2, Oracle, SQL 
Server, Sybase 

Real time: Data that is captured, and made available as it is happening. Real 
time data reflects the latest status of the organization's operational transaction 
data. Current moment in time. Real time refers to what is happening to any 
piece of data right now. For analysis, some people want to see current rather 
than historical data as is the case with most data warehouses. 

Recursive: A relationship between two instances of the same entity, as in 
"recursive data design". 

Referential integrity: The concept of enforced relationships between tables 
based on the definition of a primary key and foreign key. 

Release concept: A new approach to development that produces a fully 
tested, fully documented, high-quality, but only partially functioning application 
until the final release, which completes the application. The release concept 
severs the notion that a project deliverable must equal a complete application. 
Instead it tightens and expands on the concept of a pilot by producing a 
partially functioning application, which is refined and enhanced several more 
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times through several more releases before it becomes a fully functioning 
application. This concept is the embodiment of iterative development and is 
fully compatible with XP (extreme programming) and the new agile and 
adaptive methodologies. 

Resources: People and budget needed to perform the data warehouse tasks 

RFP - Request for Proposal: A formal request to a vendor to submit a 
proposal to provide a product or service. 

ROI - return on investment: Usually represented as a percentage of tangible 
monetary value in relation to the cost of the system. 

Rolled up: Aggregated to a higher level 

Scalable: Ability to increase the number of users, the size of the databases 
and the complexity of the queries and reports without having to replace the 
existing platform or architecture. 

Scope: An itemized accounting and definition of the agreed upon project 
deliverable in terms of functionality as well as data. In data warehousing, the 
data scope is more critical than the functional scope for correctly estimating the 
development effort. 

Scope creep: The addition of new requirements, source data or users to the 
initial agreement of what the project will be delivering. 

Semantic layer: A layer between the end-user tool and the database. This 
allows the end-user tool to present the data most effectively for the end-user 
understanding and then to generate the proper query to the database. 

Service level agreement (SLA): The definition of a level of service provided 
by the IT department for a particular system. Service level agreements can be 
established for availability (24 hours/day, 7 days/week and 98% during 
scheduled hours), for performance (response time for 95% queries in 1 minute 
or less), for timeliness of the data (weekly data available 6 AM Monday 
morning), or for other reasons. Contract with a service provider - be it an 
internal IT organization, an ASP, or an outsourcer - specifying discrete 
reliability and availability requirements for a given system. Might also include 
such requirements as support of certain technology standards or data volumes. 
Outsourcer's failure to adhere to the terms laid out in the SLA could result in 
financial penalties. 

Sign-off: The process of agreeing - in writing - to the scope of a project or the 
acceptability of a deliverable. 

Silo, siloized: A silo system cannot easily integrate with any other system. 
This means we have multiple versions of the same data, violating the idea of a 
single version of the truth. 
See stovepipe and Island of automation 
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Single version of the truth: A primary goal of the data warehouse wherein 
the data to be accessed resides in only one database so that there will be no 
conflicting data and no inconsistent reports. 

Shelfware: Software that is not being used as in "sitting on the shelf. 

SMP - Symmetrical Multi-Processing: A parallel hardware organization that 
emphasizes the sharing of memory resources. 

Snowflake structure: Snowflake is a star schema with normalized 
dimensions. 

Source data: The data from the operational or legacy systems that feed the 
ETL process. 

Source system: An operational system, or ODS that is used as the source or 
input to the ETL process. 

Sponsor: The person in the organization, usually from the business side - 
who supports the project. This person should be someone with power, money 
and commitment to the project. 
See Business Sponsor 

Staging area: A staging area is where the ETL programs execute and where 
the source data is prepared for the data warehouse. 
See Data Staging 

Stakeholders: People who have a vested interest in the success of the project 
or are involved in the implementation of the project. 

Standards: A standard is "Thou shall" while a guideline is a recommendation, 
more like "You should if your situation warrants." Data warehouse standards 
examples include: meta data, terminology, data stewardship, and privacy. 

Star schema: A modeling paradigm that has single object in the middle (fact 
table) connected to a number of objects (dimensions tables) around it radially. 

Stovepipe: A stovepipe system cannot easily integrate with any other system. 
This means we have multiple versions of the same data, violating the idea of a 
single version of the truth. 
See silo and islands of automation 

Strategy: Approach taken that will affect the overall direction of the 
organization and will establish the organization's future environment. 

Subject areas: Data Subject Area: Fundamental entities that make up the 
major components of the business, e.g. customer, product, employee. 

Function Subject Area: A business function or business activity, e.g. sales, 
order processing, inventory. 
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Suite (of products): A collection of software products from the same vendor - 
either developed or bundled by that vendor. The idea is to provide a complete 
set of tools from modeling through to access and analysis. Range of functional 
software modules that interact with each other. Suites should eliminate 
integration complexity. 

Supply chain: The management of the components, manufacturing and 
distribution of a manufactured commodity. The supply chain management 
includes warehousing and tracking inventory. 

Systems integration: The art and science of integrating processes, functions, 
people and data so the end result is a seamless and tight knit system. 

System timestamp: A system timestamp is a timestamp that is generated by 
a systems operation. Examples are : record_create_date, last_update_date,... 

SWAT team: A small team of skilled and experienced practitioners who can 
pull a failing project out of the ditch. This team does not tolerate political 
interference as it makes decisions and takes actions to bring the project to 
fruition. 

Tactical: Approach taken to achieve a specific objectives or to solve a specific 
problem. 

Target: The database into which data will be loaded from a source database or 
file; the data store that is accessed by the users. 

Terabyte: 1000 Gigabytes. 

Third normal form: A database in which each attribute in the relationship is a 
fact about a key, the whole key and nothing but the key. Usually refers to a 
fully normalized structure. 

Tie (and foot): The process of validating the number of rows, summarizations, 
and monetary totals of the source data to the data loaded into the data 
warehouse. 

Timely: Data is valuable and useful to analysts only if it represents 
organizational activities that are reasonably current. Timeliness is a function of 
the users' requirements for currency and is consistent with user expectations. 
Timeliness is usually measured by how soon the data is available after some 
distinctive end-of-period such as "two days after the close of the month." The 
act of getting the data to the users at the most opportune time. 

Time dimension: A table of descriptive attributes about the date/timestamp, 
e.g. Day of week, Month, Quarter, Season, Year, Century, Holiday, etc. 

Time variance: A characteristic of a data warehouse that defines the moment 
in time that the data or variant of the data is valid. If Order No. 123 has a value 
of $1,500.00 on Dec 1 and $1,700 on Dec 10, Dec 1 and Dec 10 shows us the 
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time variance of Order No. 123. 

Topology: The manner in which the components of a subject are arranged or 
interrelated. 

Total Cost of Ownership: The cost to the organization for the initial 
implementation and the maintenance of the system. 

Transformation: The manipulation of data to bring it into conformance with the 
business rules, domain rules, integrity rules, and with other data within the 
warehouse environment. 

Triage: The process by which projects or activities are prioritized to determine 
which should be attempted first, second, etc. and which projects or activities 
should never be done at all. This process applies to the cleansing process to 
determine which data should be cleaned first, second, etc. and which data 
should not be cleaned at all. Triage considers the value of cleansing, the 
complexity and the cost and the order in which the cleansing should be 
accomplished. 

Trickle feed: The process by which data updates the target database a little at 
a time. This is in contrast to massive updates that take place after the close of 
a period such as the day, month or quarter. The process of feeding data from 
one system to another in either real-time or small time intervals. 

UPC - Universal product code: A unique bar code embossed on every 
product used for inventory control. 

User: A knowledge worker, a business analyst, a statistician, or a business 
executive who will access the data in the data warehouse to perform some 
type of business analysis. 
See Business Users 

Value added: The notion of additional benefit being provided by some activity 
or service. 

Virtual enterprise data warehouse: An enterprise data warehouse 
constructed of multiple data marts and a request broker computer application. 
The data warehouse does not physically exist except through out the formation 
of the integrated data marts. 

Vision: The direction of the data warehouse - what it is intended to 
accomplish. 

Visionary: The person in the organization who articulates the data warehouse 
direction - what it is intended to accomplish. 

Visualization: The presentation of results in a format other than just numbers 
with a display that may include graphs, and charts making copious use of 
colors and figures. 
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VLDB (Very Large Database): The perception of what constitutes a VLDB 
continues to grow. A one terabyte database would normally be considered to 
be a VLDB. 

Work Breakdown Structure: A detailed list of tasks to be performed on the 
project. 

Workload: The quantity of processing to include the machine cycles and the 
disk l/Os. 
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